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(57) There is disclosed an image extraction method 
for extracting, from a subject image that records both 
the background and the object to be extracted, image 
data of the object using a mask. An initial mask used for 
extracting the subject region is generated on the basis 
of difference data between the subject image and the 
background image that records the background alone, 
the region of the initial mask is grown on the basis of the 
similarity between the features of a first region and its 
neighboring second region in the subject image corre- 
sponding to the initial mask, and the image data of the 
object is extracted from the first image on the basis of 
the grown mask region. 



FIG. 1 



LENS 

DRIVING 

CONTROLLER 



( ) 



IMAGE _ 



121 



(MAGE 
SIGNAL \- 



1d 



IMAGE 
RECORDER 



hi 



d. 



IMAGE 
MEMORY 



1 
3b 



NORMALIZED S 
EDGE INTENSITY 
EXTRACTOR 



INITIAL MASK 

REGION 
EXTRACTOR 



INTERFACE 
UNIT 



REGION 
GROWING 



CONTOUR 
SHAPER 



SUBJECT IMAGE 
OUTPUT UNIT 



EXTRACTION APPARATUS 



X 



TERMINAL 
APPARATUS 



DISPLAY 
APPARATUS 



Printed by Jouve, 75001 PARIS (FR) 



EP 0 817 495 A3 



European Patent EUROPEAN SEARCH REPORT Numbe ' 

Office Ep g7 30 4699 



DOCUMENTS CONSIDERED TO BE RELEVANT 




Category 


Citation of document with indication, where appropriate, 
of relevant passages 


Relevant 
to claim 


CLASSIFICATION OF THE 
APPLICATION (lnLCI.6) 


Y 

A 
Y 
A 

A 

A 

A 


GAMB0TT0 J -P: "A NEW APPROACH TO 
COMBINING REGION GROWING AND EDGE 
DETECTION" 

PATTERN RECOGNITION LETTERS, 
vol. 14, no. 11, 

1 November 1993 (1993-11-01), pages 
869-875, XPO00403545 

* the whole document * 

EP 0 634 872 A (SONY CORP) 
18 January 1995 (1995-01-18) 

* abstract * 

* page 2, line 26 - line 54 * 

* page 5, line 27 - page 7, line 27 * 

* figure 5 * 

EP 0 400 998 A (SONY CORP) 
5 Decentoer 1990 (1990-12-05) 

* abstract * 

* page 4, line 40 - page 5, line 54 * 

EP 9 671 706 A (NIPPON TELEGRAPH & 
TELEPHONE) 13 September 1995 (1995-09-13) 

* page 2, column 1, line 14 - line 35 * 

* page 5, column 7, line 9 - line 18 * 

US 5 519 436 A (MUNS0N BILL A) 
21 May 1996 (1996-05-21) 

* abstract * 

* column 3, line 11 - column 6, line 29 * 

-/-- 


1-3,6,11 

7,12,13 

1-3,6,11 

4,5 

1-5,9, 1G 
1-5 

1,2 


H04N7/26 
H04N7/36 

G06T5/G0 I 


TECHNICAL FIELDS 
SEARCHED (lnLO.6) 


HQ4N 

G06T 


The present search report has been drawn up for all claims 


Place of search Date of completion of the seach Examiner 

THE HAGUE 10 February 2000 Marie- Julie, J-M 


CATEGORY OF CITED DOCUMENTS T : theory or prinople underlying the swenndn 

E : earlier patent document, but published an, or 
X : particularly relevant if taken alone after the filing date 
Y ; particularly relevant 1 combined with another D : document cited in the application 
document of the same category L r document cited for other reasons 

A : technological background 

O : non-written disclosure & : member of the same patent famiy, corresponding 
P : intermediate document document 



EP 0 817 495 A3 




European Patent 
Office 



EP 97 30 4699 



Application Number 



CLAIMS INCURRING FEES 



The present European patent application comprised at the time of filing more than ten claims. 

□ Only part of the claims have been paid within the prescribed time limit. The present European search 
report has been drawn up for the first ten claims and for those claims for which dalms fees have 
been paid, namely claim(s): 



□ No claims fees have been paid within the prescribed time limit. The present European search report has 
been drawn up for the first ten claims. 



LACK OF UNITY OF INVENTION 



The Search Division considers that the present European patent application does not comply with the 
requirements of unity of invention and relates to several inventions or groups of inventions, namely: 



see sheet B 



nrfl All further search tees have been paid within the fixed time limit. The present European search report has 
I — I been drawn up for all dalms. 

□ As all searchable claims could be searched without effort justifying an additional fee, the Search Division 
did not invite payment of any additional fee. 

□ Only part of the further search fees have been paid within the fixed time limit. The present European 
search report has been drawn up for those parts of the European patent application which relate to the 
inventions in respect of which search fees have been paid, namely claims: 



□ None of the further search fees have been paid within the fixed time limit. The present European search 
report has been drawn up for those parts of the European patent application which relate to the Invention 
first mentioned in the claims, namely claims : 



3 



EP 0 817 495 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 97 30 4699 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with indication, where appropriate, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION (lntCI.6) 



A 
Y 



CORTEZ D ET AL: "IMAGE SEGMENTATION 
TOWARDS NEW IMAGE REPRESENTATION METHODS" 
SIGNAL PROCESSING. IMAGE COMMUNICATION, 
vol . 6, no. 6, 

1 February 1995 (1995-02-01), pages 
485-498, XP000491855 

* the whole document * 

LETTERA C ET AL: "FOREGROUND/BACKGROUND 
SEGMENTATION IN VIDEOTELEPHONY" 
SIGNAL PROCESSING. IMAGE COMMUNICATION, 
vol . 1, no. 2, 

1 October 1989 (1989-10-01), pages 
181-189, XP000234867 

* the whole document * 

EP 0 532 823 A (INST PERSONALIZED 
INFORMATION) 24 March 1993 (1993-03-24) 

* abstract * 

* page 2, column 2, line 56 - page 3, 
column 4, line 33 * 

KASS M ET AL: "SNAKES: ACTIVE CONTOUR 
MODELS" 

INTERNATIONAL JOURNAL OF COMPUTER VISION, 
1988, pages 321-331, XPOO0675014 

HUANG Q ET AL: "FOREGROUND/BACKGROUND 
SEGMENTATION OF COLOR IMAGES BY 
INTEGRATION OF MULTIPLE CUES" 
PROCEEDINGS OF THE INTERNATIONAL 
CONFERENCE ON IMAGE PROCESSING. 
(ICIP),US,LOS ALAMITOS, IEEE COMP. SOC. 
PRESS, 1995, pages 246-249, XP000624221 
ISBN: 0-7803-3122-2 

* the whole document * 



The present search report has been cfrawn up for all claims 



THE HAGUE 



OMm at completion of the 



1,6 



1,13-19 



1,13-19, 
31-36 



TECHNICAL FIELDS 
SEARCHED (lnLCL6) 



1,13,20, 
21 



22,24,36 



24-28, 

31-35 

29,30 



10 February 2000 



Marie-Julie, J-M 



CATEGORY OF CITED DOCUMENTS 



X : partioutafiy relevant if taken alone 
Y : partial tarty relevant if combined with 

document of the some category 
A : technological background 

O: 
P: 



T : theory orprinafcite underlying the inven 
E : earlier patent document, but pub is bed on, or 

after the filing date 
D : document cited m the application 
L : document cited for other reasons 

& : meniber of the same patent family. corresponding 



4 



EP 0 817 495 A3 




European Patent 
Office 



LACK OF UNITY OF INVENTION 
SHEET B 



EP 97 30 4699 



Application Number 



The Search Division considers that the present European patent application does not comply with the 
requirements of unity of invention and relates to several inventions or groups ot inventions, namely: 

1. Claims: 1-21 

An Image extraction method comprising a region growing step 
characterised by generating a mask on the basis of 
difference data between a first and a second image. 



2. Claims: 22-36 

An image extraction method comprising region growing 
characterised by the use of an adaptive threshold value. 



5 



EP 0 817 495 A3 



4 



European Patent 



EUROPEAN SEARCH REPORT 



Application Number 

EP 97 30 4699 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with indication, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION <lntCI.6) 



A 

X 



WALDOWSKI M: "A NEW SEGMENTATION 
ALGORITHM FOR VIDEOPHONE APPLICATIONS 
BASED ON STEREO IMAGE PAIRS" 
IEEE TRANSACTIONS ON 
COMMUNICATIONS, US, IEEE INC. NEW YORK, 
vol. 39, no. 12, 

1 December 1991 (1991-12-01), pages 
1856-1868, XP000268153 
ISSN: 0090-6778 

* paragraph [01 II] * 

KIM C ET AL: "MOVING TARGET EXTRACTION 
ALGORITHM FOR SELECTIVE CODING- 
PROCEEDINGS SPIE - INTERNATIONAL SOCIETY 
FOR OPTICAL ENGINEERING, 1996, ISA, 
vol. 2727, no. 3, 
17 March 1996 (1996-03-17), pages 
1130-1139, XP002130220 
ORLANDO 

* paragraph [0002] * 

YEMEZ Y ET AL: "REGION GROWING MOTION 
SEGMENTATION AND ESTIMATION IN 
OBJECT-ORIENTED VIDEO CODING" 
PROCEEDINGS OF THE INTERNATIONAL 
CONFERENCE ON IMAGE PROCESSING 
(ICIP),US,NEW YORK, IEEE, 1996, pages 
521-524, XP000733293 ISBN: 0-7803-3259-8 

* paragraph [0O03] * 

-/- 



30 



22 
29 



22 

22,36 



TECHNICAL FIELDS 
SEARCHED flnt.d.6) 



The present search report has been drawn up for all claims 



THE HAGUE 



Date of completion of the search 

10 February 200O 



Marie-Julie, J-M 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant 4 taken alone 

Y : particularly relevant if combined with another 

document of the tune category 
A : technological background 

O: 
P:i 



T : theory or princple underlying the ■ 
E : earlier patent document, but published on, or 

after the fling date 
D : dooument cited in tf 



& : member of the same patent famiy, corresponding 



6 



EP 0 817 495 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 97 30 4699 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with indication, where appropriate, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION (lnLCI.6) 



CHU C -C ET AL: "THE INTEGRATION OF IMAGE 
SEGMENTATION MAPS USING REGION AND EDGE 
INFORMATION " 

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND 
MACHINE INTELLIGENCE, US, IEEE INC. NEW 
YORK, 

vol. 15, no. 12, 

1 December 1993 (1993-12-01), pages 
1241-1252, XP000420359 
ISSN: 0162-8828 

* paragraph [GO IV] - paragraph [00V I] * 

EP 0 706 155 A (SONY CORP) 
10 April 1996 (1996-04-10) 

* page 8, line 42 - page 9, line 31 * 



31-35 



31-35 



TECHNICAL FIELDS 
SEARCHED (lnt.CI.6) 



The present search report has been drawn up for all claims 



THE HAGUE 



Date of compie-tran of the search 

10 February 2000 



Marie-Julie, J-M 



s 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant if taken alone 

Y : particularly re to v ant if oombmed with anotfi 

document of the tamo category 
A : technological background 
O : non-written dSaotoeure 
P : Intermediate document 



T: 

E : earlier patent document, but published on. or 

after the fifing date 
D : document citod in the application 
L : document cited for other reasons 

'imiberof the same patent family, corresponding 



7 



EP 0 817 495 A3 



ANNEX TO THE EUROPEAN SEARCH REPORT 
ON EUROPEAN PATENT APPLICATION NO. 



EP 97 30 4699 



This annex lists the patent family members relating to the oaten t documents cited in the atxM^^nentinnoH F=M*™e a n search r e ««r* 
The members are as contained in the European Patent Office EDP file on J 
The European Patent Office is in no way liable for these particulars which are merely given for the purpose of information. 

10-02-2000 



Patent document 
cited in search report 



EP 0634872 



Publication 
date 



Patent family 
member(s) 



18-01-1995 



JP 

US 



7079440 A 
5706367 A 



§ 

uj For more details about this annex : see Official Journal of the European Patent Office, No. 12/82 



Publication 
date 



20-03-1995 
96-01-1998 



EP 0400998 


A 


05-12- 


1990 


JP 


3002984 A 


09-01-1991 










OE 


69024539 D 


15-02-1996 










DE 


69024539 T 


30-05-1996 










US 


5097327 A 


17-03-1992 


EP 0671706 


A 


13-09- 


1995 


JP 


7302328 A 


14-11-1995 










DE 


69510252 D 


22-07-1999 










DE 


69510252 T 


11-11-1995 










US 


5748775 A 


05-05-1998 


US 5519436 


A 


21-05- 


1996 


NONE 






EP 0532823 


A 


24-03- 


1993 


JP 


2856229 B 


10-02-1999 










JP 


5081429 A 


02-04-1993 










US 


5471535 A 


28-11-1995 


EP 0706155 


A 


10-04- 


1996 


US 


5835237 A 


10-11-1998 










CN 


1127562 A 


24-07-1996 










WO 


9529462 A 


02-11-1995 



8 



Europaisches Patentamt 




(19) 


QjYl European Patent Office 






**^/* Office europeen des brevets 


(1D EP 0 817 495 A2 


(12) 


EUROPEAN PATENT APPLICATION 


(43) 


Date of publication: 


(51) IntCL 6 : rlU4IM ffZx> 


07.01.1998 Bulletin 1998/02 






Application number: 97304699.8 






Date of filing: 30.06.1997 




(84) 


Designated Contracting States: 


• Takiguchi, Hideo 


AT BE CH DE DK ES Fl FR GB GR IE IT LI LU MC 


Ohta-ku, Tokyo (JP) 




NL PT SE 


• Hatanaka, Koji 




Designated Extension States: 


Ohta-ku, Tokyo (JP) 




AL LT LV RO SI 








(74) Representative: 


(30) 


Priority: 05.07.1996 JP 194113/96 


Beresford, Keith Denis Lewis et al 


BERESFORD & Co. 


(71) 


Applicant: CANON KABUSHIKI KAJSHA 


2-5 Warwick Court 


Tokyo (JP) 


High Holborn 






London WC1R 5DJ (GB) 


(72) 


Inventors: 




• 


Matsugu, Masakazu 






Ohta-ku, Tokyo (JP) 





(54) Image subject extraction apparatus and method 



CM 
< 

00 



(57) There is disclosed an image extraction method 
for extracting, from a subject image that records both 
the background and the object to be extracted, image 
data of the object using a mask. An initial mask used for 
extracting the subject region is generated on the basis 
of difference data between the subject image and the 
background image that records the background alone, 
the region of the initial mask is grown on the basis of the 
similarity between the features of a first region and its 
neighboring second region in the subject image corre- 
sponding to the initial mask, and the image data of the 
object is extracted from the first image on the basis of 
the grown mask region. 
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Description 

BACKGROUND OF THE INVENTION 
5 Field of The Invention 

The present invention relates to an image extraction apparatus and method for extracting a target subject from a 
background image and a subject image. More particularly, the present invention is directed to a method and apparatus 
for appropriately generating a mask used for extracting a target subject. 

10 

Related Arts 

Conventionally, as general techniques for realizing image extraction, a chromakey method using a specific color 
background, a videomatte method for generating a key signal by performing a histogram process, difference (or dif- 
15 ferential) process, contour enhancement or contour tracking process of an image signal (The Television Society Tech- 
nical Report, Vol. 12, pp. 29 - 34, 1988), and the like are known. 

A technique for performing image extraction based on the difference from the background image is a statenDf-the- 
art one, and for example, Japanese Patent Laid-Open No. 4-216181 discloses a technique for detecting or extracting 
a target object in a plurality of specific regions in an image by setting a mask image (i.e., a specific processing region) 
20 in difference data between the background image and the image to be processed. 

Furthermore, Japanese Patent Publication No. 7-16250 discloses a technique for obtaining color-converted data 
of an original image including a background using a color model of the object to be extracted, and the existence prob- 
ability distribution of the object to be extracted from brightness difference data between the background image and the 
original image. 

25 in the difference method from the background image, the luminance level or color component difference between 

the pixels of the background image and the subject image is normally expressed by a predetermined evaluation function, 
and the evaluation function is subjected to a thresholding process to extract a region having a difference level equal 
to or higher than an initial value. As the evaluation function, the correlation between blocks having individual points as 
centers and a predetermined size (Rosenfeld, A. and Kak, A.C., Digital Picture Processing (2nd ed.), Academic Press, 

30 1982), normalized principal component features (Journal of the Institute of Electronics, Information and Communication 
Engineers, Vol. J74-D-I I, pp. 1 731 - 1 740), a weighted sum value of a standard deviation and a difference value (Journal 
of the Television Society, Vol. 45, pp. 1 270 - 1 276, 1 991 ), a local histogram distance associated with hue and luminance 
level (Journal of the Television Society, Vol. 49, pp. 673 - 680, 1 995), and the like are used. 

Japanese Patent Laid-Open No. 4-328689 and Japanese Patent Publication No. 7-31248 disclose a method of 

35 extracting a moving object alone by extracting motion vectors or inter-frame difference data from moving images. 
Japanese Patent Publication Nos. 7-66446, 6-14358, and 4-48030 disclose a method of extracting a moving object 
based on the difference from the background image. Furthermore, a method of extracting the binocular disparity dis- 
tribution (i.e., the distance distribution from image sensing means) from images from right and left different view point 
positions obtained using a binocular image sensing system, and segmenting an object from the background on the 

40 basis of the disparity distribution ( 1995 Information System Society Meeting of the Society of Electronics, Information 
and Communication Engineers, pp. 138), or the like is known. 

However, of the above-mentioned prior arts, the chromakey method suffers from the following problems: 

i: this method cannot be used outdoors due to serious background limitations, and 
45 ji: color omission occurs. 

Also, the videomatte method suffers from the following problems: 

i: the contour designation must be manually and accurately performed in units of pixels, and 
50 U: such operation requires much labor and skill. 

Furthermore, the difference method from the background image is normally difficult to realize due to the following 
problems: 

55 j: the background is hard to distinguish from the subject in a partial region of the subject including a portion similar 

to the background, 

ii: the difference method is readily influenced by variations in image sensing condition between the background 
image and subject image, 
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iii: a shadow portion formed by the subject is hard to remove, and 

iv: in order to faithfully extract the boundary line between the background and subject, the background image and 
subject image must have considerably different image characteristics (pixel values and the like) in the vicinity of 
the boundary therebetween. 

5 

The technique disclosed in Japanese Patent Publication No. 7-16250 is not suitable for image extraction of an 
arbitrary unknown object since it requires a color model for the object to be extracted. 

In either the method of extracting a moving object from moving images or the method of extracting a subject from 
the disparity distribution, it is generally hard to extract a subject with high precision independently of the contrast in 
v 10 the boundary portion between the subject and background. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to provide an image extraction apparatus and method, which can stably 
J5 extract a subject image in which the background and subject have no distinct difference between their image charac- 
teristics. 

It is another object of the present invention to provide an image extraction apparatus and method which can obtain 
a large area of a subject region before region growing by a small number of processing steps, and can extract details 
of a contour shape. 

20 it is still another object of the present invention to provide an image extraction apparatus and method which can 

execute a process for equalizing the contour line of a mask after region growing with that of an actual subject without 
being influenced by the background pattern near the contour line of the subject. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
stably grow an initial mask only in a subject region independently of variations in the region growing condition, i.e., the 
25 tolerance value of a feature difference from a neighboring region. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
suppress variations in edge intensity distribution caused by a difference in image sensing conditions between the v 
background image and the subject image, noise, or the like, and can accurately extract the contour shape of the subject 
and the edge of a background portion present in the subject region. u 
30 it is still another object of the present invention to provide an image extraction apparatus and method which can 

stably extract a subject image even when the edge intensity serving as a boundary between the subject and background 
is small and the subject includes a relatively thin shape. 

It is still another object of the present invention to provide an image extraction apparatus and method which can , * 

stably extract the contour shape of a subject without being influenced by the edge distribution of a background portion 
35 present in the vicinity of the subject. 

It is still another object of the present invention to provide an image extraction apparatus and method which can . 
automatically retrieve an incomplete partial shape after region growing on the basis of the condition of shape continuity 
and can smooth shape data. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
40 stably extract a subject image independently of any specific difference between the image characteristics of the back- 
ground and subject without being influenced by the background pattern. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
stably and accurately extract a subject image upon executing extraction based on region growing. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
45 obtain an extracted image with stably high precision independently of any specific difference between the image char- 
acteristics of the background and subject upon executing extraction based on the difference from the background 
image. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
extract a subject on the basis of region growing that can faithfully reconstruct the contour shape of the object to be 
50 extracted. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
extract a region closest to a subject while suppressing unlimited region growing. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
obtain stably high extraction precision even for a subject having a complicated contour shape by suppressing region 
55 growing across an edge and region growing from an edge. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
obtain stably high extraction performance even in the presence of noise such as a shadow present outside a subject 
(in the background) or an unclear portion of the contour of the subject. 
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It is still another object of the present invention to provide an image extraction apparatus and method which can 
realize region growing that can satisfactorily approximate the outer shape of the extracted subject to a correct subject 
shape even when the shape of a partial region extracted in advance does not match the contour shape of the subject. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
s realize automatic extraction of a specific subject from moving images with high precision. 

It is still another object of the present invention to provide an image extraction apparatus and method which can 
realize automatic extraction of a specific subject with high precision using a plurality of images obtained from different 
view points 

In order to achieve the above objects, according to the present invention, there is provided an image extraction 
10 method for extracting, from a first image that records both a background and an object to be extracted, image data of 
the object using a mask, comprising: 

the first step of generating an initial mask for extracting an image of the object on the basis of difference data 
between the first image and a second image that records the background alone; 
is the second step of growing a region of the generated initial mask on the basis of a similarity between features of 

a first region of the first image corresponding to the initial mask, and a second region in the vicinity of the first 

the'Sdatep of extracting the image data of the object from the first image on the basis of the grown mask region. 

20 According to the image extraction method, subject extraction that can eliminate the influence of noise and variations 

in image sensing condition, and automatically removes any light shadow portion can be realized. Also, a subject region 
including a region having image characteristics similar to those of a background image can be extracted in the subject. 

In order to achieve the above objects, according to the present invention, there is provided an image extraction 
method comprising: 



25 



the partial region extraction step of extracting a partial region as a portion of a subject to be extracted from an 

Ihe^gtogrowing step of growing the extracted partial region using the extracted partial region as a seed by 
thresholding a similarity to a neighboring region in which the threshold value being set on the basis of a feature 
30 distribution at individual points of the input image; and 

the extraction step of extracting an image of the subject on the basis of the region after region-growing. 

According to the image extraction method, a subject image can be extracted with stably high precision independ- 
ently of variations in parameters used in similarly evaluation, a shadow in the background, and complexity of the image 
35 pattern of the subject upon executing extraction based on region growing. 

In order to achieve the above objects, according to the present invention, there is provided an image extraction 
apparatus for extracting, from a first image including both a background and an object to be extracted, image data of 
the object using a mask, comprising. 

40 temporary storage means for receiving and temporarily storing the first image and a second image that records 

the background; 

initial mask generating means for generating an initial mask of an extraction region on the basis of difference data 
between the stored first and second images; 

region growing means for growing a region of the initial mask on the basis of a feature similarity to a neighboring 

45 region; and . , ... 

first image extraction means for extracting the image data of the object from the first image on the basis of the 

grown mask region. 

According to the image extraction apparatus, upon extraction of an initial mask, the influence of noise and variations 
so in image sensing condition can be eliminated, and any light shadow portion can be automatically removed. Also, a 
subject region can be stably and automatically extracted independently of the presence/absence of a region similar to 
a background image in the subject. . .. 

In order to achieve the above objects, according to the present invention, there is provided an image extraction 

apparatus comprising: 

partial region extraction means for extracting a partial region as a portion of a subject to be extracted from an input 

image; . . 

region growing means for growing the extracted partial region using the extracted partial region as a seed by 
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thresholding a similarity to a neighboring region in which the threshold value being set on the basis of a feature 
distribution at individual points of the input image; and 

extraction means for extracting an image of the subject on the basis of the region after region-grown. 

5 According to the image extraction apparatus, a subject image can be extracted with stably high precision inde- 

pendently of variations in parameters used in similarly evaluation, a shadow in the background, and complexity of the 
image pattern of the subject upon executing extraction based on region growing. 

According to a preferred aspect of the present invention, the first step includes the step of using as the initial mask 
a binary image region obtained by a binarization process of difference data representing a difference between image 
10 data of the first and second images using a predetermined threshold value. The details of the subject shape can be 
extracted in a process before region growing while eliminating the influence of noise and the like. 

According to a preferred aspect of the present invention, the difference data represents a brightness difference 
between the first and second images. 

According to a preferred aspect of the present invention, the difference data represents a color difference between 
is the first and second images. 

According to a preferred aspect of the present invention, the first step comprises: 

the step of obtaining a first binary image region by a binarization process of data representing a brightness differ- 
ence between the first and second images using a predetermined threshold value; 
20 the step of obtaining a second binary image region by a binarization process of data representing a color difference 

between the first and second images using a predetermined threshold value; and 
the step of generating the initial mask by combining the first and second binary image regions. 

According to a preferred aspect of the present invention, the second step includes the step for judging based on 
2S brightness and hue similarities between the first and second regions if a pixel in the second region is to be incorporated 
in the first region, and growing the mask region upon incorporating the pixel. 

According to a preferred aspect of the present invention, the second step comprises: 

the step of respectively extracting first and second edge intensity images from the first and second images; 
30 the step of calculating an edge density on the basis of data representing a difference between the first and second 

edge intensity images; and 

the step of suppressing growing of the mask when the calculated edge density is not more than a predetermined 
threshold value in a growing direction. Even when the region growing condition is relaxed or roughly set, region 
growing outside the subject can be suppressed, and high-precision subject extraction can be realized. Also, even 
35 when the initial mask region includes a region other than the subject (e.g., a shadow portion), growing from such 

region can be suppressed. 

According to a preferred aspect of the present invention, the first step comprises: 
the step of normalizing the difference data representing the difference between the first and second images, and 
40 generating the initial mask on the basis of normalized brightness difference data. In object extraction, the influence of 
slight variations in image sensing condition (white balance characteristics, illumination characteristics, exposure con- 
dition, and the like) between the first and second images can be suppressed. 

According to a preferred aspect of the present invention, the first step comprises: 

45 the step of extracting first and second edge intensity images representing edge intensities of the first and second 

images, respectively; and 

the step of normalizing both the first and second edge intensity images using a predetermined normalization co- 
efficient when the first edge intensity image is an image having a small number of edges, the normalization coef- 
ficient being a maximum intensity value of the first edge intensity image. For this reason, even when the first and 
so second images suffer slight variations in image sensing condition (white balance characteristics, illumination char- 

acteristics, exposure condition, and the like), edge intensity variations can be prevented from being amplified. In 
this manner, the probability of background edge data being left in a region outside a subject in edge difference 
data can be made very low. 

ss According to a preferred aspect of the present invention, the first step comprises: 

the step of extracting first and second edge intensity images representing edge intensities of the first and second 
images, respectively; and 
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the step of normalizing both the first and second edge intensity images using a maximum edge intensity value 
within a predetermined size region having a predetermined point of the first edge intensity image as a center when 
the first edge intensity image is an image having many edges. Accordingly, when the subject has a fine partial 
shape, the contour shape of details can be stably extracted even when the edge intensity is low, and noise amph- 
5 fication in a low-contrast partial region in the vicinity of the subject can be suppressed upon normalization. 

According to a preferred aspect of the present invention, the second step includes the step of comparing differences 
between brightness and hue values of the first and second regions with predetermined threshold values, and deter- 
mining that the second region is similar to the first region when the differences are smaller than the predetermined 
10 threshold values. Accordingly, when the contour shape is incomplete (e.g., it includes discontinuous uneven portions 
different from the actual shape) as a result of region growing, correction of such shape can be performed while auto- 
matically considering the image feature's continuity and shape continuity in the subject. 

According to a preferred aspect of the present invention, the second step further comprises the fourth step of 
shaping a contour line of the grown mask, and the fourth step comprises: 

15 

the step of detecting the contour line of the grown mask; 

the step of generating an edge intensity image representing a difference between the first and second images; 
the step of setting a region having a predetermined width in a direction perpendicular to an extending direction of 
the contour line in the edge intensity image; 
20 the step of selecting a plurality of pixels of the edge intensity images in the region of the predetermined width as 

contour point candidates; and 

the step of selecting one contour point on the basis of continurty between a pixel on the contour line and the plurality 
of contour point candidates, thereby shaping the contour line of the mask. Accordingly, when the contour shape 
is incomplete (e g., it includes discontinuous uneven portions different from the actual shape) as a result of region 
25 growing, correction of such shape can be performed while automatically considering the image feature continuity 

and shape continuity in the subject. 

According to a preferred aspect of the present invention, the continuity is determined by inspecting pixel value 

30 C ° nt According to a preferred aspect of the present invention, the continuity is determined by inspecting shape continuity. 

According to a preferred aspect of the present invention, the continuity is determined by inspecting continuity with 
a pixel present inside the contour line. 

According to a preferred aspect of the present invention, the continuity is determined by weighting and evaluating 
pixel value continuity and shape continuity. 
35 According to a preferred aspect of the present invention, the fourth step further includes the step of smoothing the 

shaped contour line. 

According to a preferred aspect of the present invention, the fourth step comprises: 
the active contour shaping step of recursively executing a process for deforming or moving a contour shape of 
the mask to minimize a predetermined evaluation function on the basis of the initial mask or a contour of the grown 
40 mask, and image data of the first image. Accordingly, the shape of a non-grown region that remains as a result of 
region growing can be corrected and retrieved. 

According to a preferred aspect of the present invention, the active contour shaping step comprises: 
generating a contour line by performing an active contour shaping process on the data of the initial mask, and 
performing an active contour shaping process of the image data of the first image on the basis of the generated contour 
45 line. Hence, the contour shape of the subject can be normally extracted without being influenced by the background 

^According to a preferred aspect of the present invention, the partial region extraction step includes the step of 
extracting the partial region on the basis of a difference between a background image excluding the subject, and a 
subject image including the subject. Consequently, the extracted image can be obtained with stably high precision 

so independently of any specific difference between the image characteristics of the background and subject in subject 
extraction based on the difference from the background image and region growing. 

According to a preferred aspect of the present invention, the feature distribution is an edge distribution of the 
subject. As a result, the contour shape of a subject can be faithfully reconstructed by suppressing unlimited growing 
in the vicinity of an edge upon executing region growing. 

55 According to a preferred aspect of the present invention, the feature distribution is a distribution within a maximum 

growing range set based on the partial region. Accordingly, region growing that can eliminate the influence of noise, 
shadows, and illumination conditions, and can roughly obtain the subject shape can be realized inside a partial region 
and a region in the vicinity of the partial region. 
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According to a preferred aspect of the present invention, the threshold value is set to assume a value that sup- 
presses growing of the region at an edge position as compared to a non-edge position. So, region growing outside an 
edge, and region growing having an edge as a start point can be suppressed, and the contour shape of a subject after 
region growing can be stabilized. 
5 According to a preferred aspect of the present invention, the threshold value is set to assume a value that promotes 

growing of the region in a region within the maximum growing range, and to assume a value that suppresses growing 
of the region outside the maximum growing region. Hence, extraction faithful to the subject shape can be realized even 
in a partial region having a low-contrast boundary from the background, and a partial region with a shadow. 

According to a preferred aspect of the present invention, the maximum growing range is obtained as an output 
w when a shape of the partial region is smoothed using a smoothing filter having a predetermined size. Accordingly, even 
when the shape of a partial region extraction in advance has a missing portion or protruding portion, and has a large 
local difference from the subject shape, region growing that can relax the influence of such difference can be realized. 

According to a preferred aspect of the present invention, the input image includes time-serial images, and the 
partial region extraction step includes the step of extracting the partial region on the basis of difference data between 
is image frames at different times of the input image. As a consequence, a subject that moves in an image can be 
automatically extracted with high precision based on the distribution of motion vectors. 

According to a preferred aspect of the present invention, the input image includes a plurality of images from a 
plurality of different view point positions, and the partial region extraction step includes the step of extracting the partial 
region on the basis of a disparity distribution between the input images. Accordingly, a specific subject can be auto- 
20 matically extracted with high precision based on the distribution of subject distances. 

Other features and advantages of the present invention will be apparent from the following description taken in 
conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts 
throughout the figures thereof. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram showing the arrangement of an image sensing system in the first embodiment; 
Fig. 2 is a view for explaining the relationship between a mask and subject image; 
Fig. 3 is a flow chart showing the subject extraction process sequence of the first embodiment; 
30 Figs. 4A and 4B show pictures of halftone images that respectively represent a subject image and background 

image; 

Figs. 5A and 5B show pictures of halftone images that represent principal intermediate results of the subject ex- 
traction,process; 

Fig. 6 shows a picture of a halftone image that represents a principal intermediate result of the subject extraction 
35 process; 

Figs. 7A and 7B show pictures of halftone images that represent principal intermediate results of the subject ex- 
traction process; 

Figs. 8A and 8B show pictures of halftone images that represent principal intermediate results of the subject ex- 
traction process; 

40 Fig. 9 shows a picture of a halftone image that represents the result of the subject extraction process; 

Fig. 10 is a flow chart showing the region growing process sequence in step S30 in Fig. 3; 

Fig. 11 is a view for explaining the process of region growing of the first embodiment; 

Fig. 12 is a view for explaining the condition for stopping contour growing in the first embodiment; 

Fig. 1 3 is a flow chart showing the contour shaping process sequence in the first embodiment; 
45 Fig. 14 is an explanatory view showing the edge selection process of the first embodiment; 

Fig. 15 is a view for explaining the processing principle of color continuity evaluation in the first embodiment; 

Fig. 16 is a view for explaining the processing principle of shape continuity evaluation in the first embodiment; 

Fig. 1 7 is a flow chart showing the active contour shaping process sequence according to the second embodiment; 

Fig. 18 is a view for explaining the operation of the active contour shaping process; 
so Fig. 19 is a block diagram showing the arrangement of an image sensing system in the third embodiment; 

Figs. 20A and 20B are flow charts showing the subject extraction process sequence of the third embodiment; 

Figs. 21 A and 21 B show pictures of images that represent principal intermediate results of the subject extraction 

process in the third embodiment; 

Fig. 22 is a view for explaining the operation principle of contour growing in the third embodiment; 
55 Fig. 23 is a view for explaining generation of a maximum range of a threshold value distribution in the third em- 

bodiment; 

Fig. 24 is a view for explaining the effect of a smoothing filter in the third embodiment; 

Figs. 25A and 25B show pictures of halftone images that represent principal intermediate results of the subject 
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extraction process in the third embodiment; 

Fig. 26 is a view for explaining the technique of similarity determination in the third embodiment; 
Fig. 27 is a view showing an example of the threshold value distribution in the third embodiment; 
Fig. 28 is a block diagram showing the arrangement of an image sensing system in the fourth embodiment; 
s Fig. 29 is a block diagram showing the arrangement of the image sensing system in Fig. 28 in detail; 

Fig. 30 is a view for explaining segmentation/integration of regions in the fourth embodiment; 
Fig. 31 A is a view showing an example of an input image sensed by a binocular camera in the fourth embodiment; 
Fig. 31 B is a view showing the rough region division result performed based on the magnitudes of disparity vectors 
in the fourth embodiment; 

10 Fig. 31 C is an explanatory view showing the image extraction result by region growing (and segmentation) on the 

basis of secondary characteristics such as color components in the fourth embodiment; and 
Fig. 32 is an explanatory view showing setting of the threshold value distribution of the fourth embodiment. 

DETAILED DESCRIPTION OF THE INVENTION 

75 

The preferred embodiments of an image extraction apparatus of the present invention will be described below with 
reference to the accompanying drawings. The image extraction apparatus of this embodiment is applied to an image 
sensing system. 

20 <First Embodiment 

Fig. 1 is a block diagram showing the arrangement of an image sensing system according to the first embodiment. 
This system is made up of an image sensing apparatus 1 , an image extraction apparatus 2, a terminal apparatus 10, 
and an image display apparatus 9. 

25 The image sensing apparatus 1 comprises, as its major constituting elements, image forming optics 1a including 

a lens, a stop, and a lens driving controller 1 e, an image sensor 1 b, an image signal processor (which performs gamma 
characteristic control, white balance control, exposure condition control, focusing characteristic control, and the like) 
1c, an image recorder 1d, and the like. 

The image extraction apparatus 2 comprises an image memory 3 including a memory 3a for temporarily storing 

30 a subject image and a memory 3b for temporarily storing a background image, a normalized edge intensity image 
extractor 4 for calculating the edge intensity of an image, and normalizing the calculated edge intensity, an initial mask 
extractor 5 for initially generating a mask region for detecting a subject region, a region growing module 6 for growing 
the initial mask region to an appropriate one, a contour shaper 7 for shaping the contour of the mask region, a subject 
image output unit 8 for outputting the extracted image of a subject, an interface unit 11 , and the like. 

35 The extraction apparatus 2 is connected to the image display apparatus 9 such as a CRT and the terminal apparatus 

10 such as a personal computer. 

This system extracts a subject region including a subject alone from an image including both the background and 
the subject (to be referred to as a subject image hereinafter), and displays the extracted region on, e.g., the display 
apparatus 9. Upon extracting the subject region, a mask is applied to the subject image. The mask is a set of binary 

40 data, which have "1 " at positions corresponding to the subject region, and have "0" at other positions, as shown in Fig. 
2. The mask is generated by the edge intensity extractor 4, the initial mask extractor 5, and the region growing module 
6 shown in Fig. 1. The edge intensity extractor 4 and the initial mask extractor 5 generate an 'initial mask", and the 
region growing module 6 and the contour shaper 7 grow the initial mask to improve it to a mask that matches the 
subject. The subject image output unit 8 applies the improved mask to the subject image (the image including both the 

45 subject and background), and outputs image data at pixel positions of the subject image corresponding to mask values 
"1", thereby outputting an image of the subject region alone. 

Note that the extraction apparatus 2 may be constituted by hardware shown in Fig. 1 , but may be constituted by 
gate arrays, or the hardware functions of the apparatus may be implemented by a program software process (e.g., the 
flow chart in Fig. 3). 

50 The image extraction apparatus of this embodiment is characterized by its individual functions and a combination 

thereof rather than the hardware arrangement. Accordingly, the functions of the apparatus will be explained with ref- 
erence to the flow chart since a description using the flow chart allows easier understanding. 
Fig. 3 is a flow chart showing the overall subject extraction process sequence. 

In steps S12 to S18 in Fig. 3, a mask for subject region extraction is initially generated, and in step S19, it is 
55 checked if the generated mask matches the subject region. In step S30, the region of the mask is grown to improve 
the mask to an appropriate one if the generated mask does not match the subject region. In step S50, the subject 
region is extracted using the finally completed mask, and a subject image is output. 

The process sequence of Fig. 3 will be described in turn below. 
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Mask Generation 

Steps S11 to S18 correspond to the mask generation procedure. 

In step S1 1 , a background image and a subject image (including the background) are input from the image sensing 
5 apparatus 1 . Fig. 4A shows an example of the subject image, and Fig. 4B shows an example of the background image. 

In step S12, image data are sub-sampled in accordance with an appropriate reduction factor to increase the 
processing speed in the subsequent steps. Subsequently, in step S13, the region to be processed, i.e., the region to 
be subjected to a subject image extraction process, is set on the subject image to include a range where the subject 
is present. Note that the user may designate the region to be processed using a mouse (not shown) of the terminal 
io apparatus 10 while observing the subject image displayed on the display apparatus 9. The sub-sampling process in 
step S12, and setting of the processing region in step S13 may be omitted since they are executed to increase the 
processing speed in the subsequent steps. 

In step S14, an edge extraction process is performed for image data of pixels corresponding to the region to be 
processed set in step S13 in the image data of both the subject image and background image, thereby generating two 
is edge intensity images. The edge intensity images are generated to estimate a boundary using the edge images since 
the brightness levels or colors sharply change in image data at the boundary between the subject region and back- 
ground region. 

Note that edge extraction may use, in addition to Sobel, Prewitt, Roberts operators, and the like (Mori, Itakura, 
Basics of Image Recognition (II), Chapter 1 5, Ohm Corp., 1 990), a Canny edge detection operator (IEEE, Trans. Pattern 
20 Analysis and Machine Intelligence, VoL PAMI-8, pp. 679 - 698, 1 986), a Marr-Hildreth edge detection operator (Proc. 
Royal Society of London, Vol. B-207, pp. 187 - 217, 1980), and the like. 

Subsequently, in step S1 5, the edge intensity images obtained in step S1 4 are normalized. The maximum intensity 
value of the edge intensity image (the maximum density value of the intensity image) extracted from the subject image 
can be used as a common factor for normalization, and all the pixel values of the two edge intensity images are divided 
25 by this common factor, thereby normalizing the two edge intensity images. 

However, a different normalization technique can be used for a subject image which includes many edges that 
define the boundary between the subject and background, i.e., a subject image having a dense distribution of contour 
lines (for example, a flower has many relatively fine partial shapes, and its image has many edges; an image having 
many edges will be referred to as an "edge-rich 0 image hereinafter). More specifically, blocks each having a predeter- 
30 mined size are set to have individual pixels of the edge intensity image of the subject as centers, and the edge intensity 

value of a pixel having the maximum value in a block including a certain pixel is replaced by the intensity value of the T 
certain pixel. This manipulation is performed for all the pixels of the edge intensity image to attain normalization. 

As another normalization technique, it is effective to use maximum intensity values in the entire images (or local > 
images) of the edge intensity subject image and edge intensity background image as normalization denominators for 
35 the respective images since the influence of variations in image sensing condition can be minimized. 

Figs. 5A and 5B respectively show images obtained by normalizing the edge intensity images extracted from the 
subject image (Fig. 4A) and background image (Fig. 4B) using, e.g., a Sobel operator (these images will be respectively 
referred to as a "normalized edge intensity subject image P NE si B and a "normalized edge intensity background image 
P NEBj " hereinafter). 

40 in step S1 6, an "edge seed" extraction process is performed from the normalized edge intensity background image 

and normalized edge intensity subject image. Note that the "edge seed" is an image which has a value "1 0 at a position 
at which the normalized edge intensity background image and normalized edge intensity subject image have consid- 
erably different pixel values, and has a value "0° at a pixel position at which their pixel values are not considerably 
different from each other. More specifically, the absolute value of the difference between a certain pixel value P NESi in 

45 the normalized edge intensity subject image and a pixel value P NEBi in the normalized edge intensity background image 
at the corresponding pixel position is calculated, and the value of the edge seed is defined at "0" at the pixel position 
where the absolute value of the difference is smaller than a predetermined threshold value (S 0 ); the value of the edge 
seed is defined at "1 " at the pixel position where the absolute value is equal to or larger than the threshold value. More 
specifically, if PK represents the pixels of the edge seed image, 

so 

if IPnesi " PnebJ < So' = 0 

55 if Ip^, - P^l £ 5 0 , PKi = 1 .-.(I) 



Note that the threshold value 5q may be adaptively changed in correspondence with images. 
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Fig. 6 shows the extracted "edge seed" image. In this manner, the edge seed image represents the difference 
between the edge intensity images. Hence, as shown in Fig. 6 or 2, the peripheral edge of the "edge seed" image 
tends to represent the contour of the subject, and its inner portion the edges of the background. 

However, since the edge seed image represents brightness edges of image data, it also includes other edges in 
the original subject image. On the other hand, some edges are erroneously determined as non-edges since they have 
a small brightness difference from the background image although they originally define the contour of the subject 
region. 

in view of this probiem, ihis embodimeni aiso considers differences ("color difference seed" or "color edge") of 
color data. In step S17, "color difference seed" extraction is performed. 

The differences between the color components (R, G, and B values or hue value) of the background image and 
subject image in units of pixels are calculated. If P b represents the pixel value of the background image, P s represents 
the pixel value of the subject image, and i represents an arbitrary pixel, the differences between the color components 
are calculated by: 









- 




AP Gi 










AP Bi 








. . . (2) 



If represents the threshold value common to R, G, and B components, the pixel values Pj of all the pixels i that satisfy: 

AP Ri < e 0 and AP Gi < e Q and AP Bi < e 0 (3) 

are set at: 

P i = 0 (4) 
On the other hand, the pixel values P s of all pixels that satisfy: 

AP Rj > e 0 and AP Gj > e 0 and AP Bi > e 0 (5) 

are set at: 

Pi = 1 (6) 

A binary image generated in this manner is a "color difference seed image". 

When a relatively large threshold value Eq is set in inequalities (3) and (5), the influence of variations in pixel values 
due to noise and image sensing condition differences can be eliminated, and a light shadow and the like can be re- 
moved. 

Subsequently, in step S18, an "initial seed" is extracted from the color difference seed (or color edge) detected in 
step S17 and the edge seed (or luminance edge) detected in step S16. Note that the "initial seed" image is formed by 
combining the color difference seed and edge seed: 

Initial Seed = Color Difference Seed + Edge Seed 

Since the initial seed is a binary image of Os and 1 s, it can serve as a mask. A region of "1 "s formed by the seed portion 
will be referred to as a "mask region" hereinafter for the sake of convenience. Since it is initially checked if the initial 
seed is proper as a mask, "initial" is added to its name. If the initial seed is not proper, a growing process is done based 
on its "seed". 
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When the initial seed is edge-rich, a background noise removal process must be additionally performed in the 
process of combining the color difference seed and edge seed. More specifically, points having pixel values equal to 
or greater than a predetermined threshold value in the normalized edge intensity background image that corresponds 
to the mask region formed by the extracted initial seed are removed. Fig. 6A shows the "initial seed (initial mask region) 
5 ° obtained by this process. 

In step S19, it is checked if the initial mask region extracted in step S18 substantially matches the subject region. 

If the two regions do not match, it is determined that the initial mask set in step S18 is incomplete or is not correct, 
and the flow advances to step S30 to execute a growing process of the mask region. Note that Fig. 1 0 shows the details 
of step S30. 

10 On the other hand, if the two regions substantially match, it is determined that the initial mask is roughly complete, 

and the subject region is extracted and output using the mask in step S50. 

Mask Growing 

is The process for performing region growing of the initial mask region when it is determined that the initial mask 

region is incomplete will be described below with reference to the flow chart in Fig. 1 0. 

In this growing process, growing proceeds using the "seed" as the center. That is, a pixel (to be referred to as a 
pixel of interest hereinafter) on the boundary of seeds in the initial mask region is compared with neighboring pixels 
^ - (or region), so as to check the similarity between the image features of the pixel of interest and the neighboring pixels. 

20 if the similarity is higher than a predetermined threshold value, the neighboring pixels are considered as those in an 
identical mask region, and are incorporated in this mask region. 

In step S31 , a difference threshold value 6, for brightness and a difference threshold value 8 H for hue are set as 
parameters required for checking the similarity for brightness and that for hue. 

In step S32, the similarity between the pixel of interest and neighboring pixels is evaluated using the threshold 

25 values. In this embodiment, the neighboring pixels include eight neighboring pixels. Whether or not the pixel of interest 
is similar to the neighboring pixels is determined as follows. That is, the absolute difference values of image data (in 
units of R, G, and B components) and the absolute difference value of hue values between the pixel of interest and 
each of the neighboring pixels are calculated. If the difference values of the R, G, and B image data are respectively 
equal to or smaller than the threshold value 6, or if the absolute difference value of hue is equal to or smaller than the 

30 threshold value 5 H , it is determined that the pixel of interest and the neighboring pixel have a small difference, i.e., they 
are similar to each other. More specifically, if one of the two inequalities based on the two threshold values holds, it is 
determined that the neighboring pixel is a portion of the subject region, and is incorporated in an identical mask region 
(step S33). That is, it P iD and P iH respectively represent the R, G, and B image data and hue value of the pixel i of 
interest, and and P^ represent the R, G, and B image data and hue value of a neighboring pixel k, if one of two 

35 inequalities below holds, it is determined that the two pixels are similar to each other: 

IP iD " PkD 1 < 6 i (8) 



IP„-P I «I<5 H (9) 

The growing process in steps S32 and S33 is performed for all the pixels located at the boundary of the mask by 
moving the pixel of interest within the initial mask region (i -> i\ k -> W), as shown in Fig. 11. If the condition in step 
S34 to be described below is satisfied, the growing process is stopped. The stop condition is that if the region growing 
process is being done in a certain direction, the density of edge seeds within a predetermined range in that direction 
is smaller (than a predetermined threshold value). If it is confirmed in step S34 that the condition is satisfied, the growing 
in that direction is stopped. 

Fig. 12 shows a result of this step. For example, in a to-be-grown region 200 present in a certain growing direction 
100 (the direction is one of eight directions pointing from the pixel of interest toward eight neighboring pixels) in Fig. 
12, the region to be processed is set to include in an extent of about 10 pixels from the most neighboring pixel. If the 
number of edge difference seeds in the region 200 to be processed is two or less (i.e., no edge difference seed, or 
one or two edge seeds are present), and the similarity between the most neighboring pixel and the pixel of interest 
satisfies the growing condition, it is determined that the region need not be grown in that direction and the region up 
to the most neighboring pixel is sufficient, thus stopping the subsequent growing. 

Note that whether or not the image to be processed is an edge-rich image may be determined, and such growing 
stop function may be automatically effected if it is determined that an edge-rich image is to be processed, or such 
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function may be added depending on the user's decision. 

Fig. 7B shows the grown mask region. The region growing process of this embodiment can be understood from 
comparison between Figs. 7A and 7B. 

Subsequently, in step S35, a hole filling process for filling "holes" which may potentially exist in the grown region 
s is executed. Such holes also exist in a non-grown portion after the region growing process. The maximum size of the 
hole to be filled may be input in advance to the system of this embodiment, or the user may determine it based on the 
region growing result. 

Fig. 8A shows an example of the mask region that has been subjected to the hole filling process. In step S35, a 
"whisker - removal process or indent correction process for correcting any indent of the contour may be added as an 
10 option in addition to the hole filling process. 

Fig. 8B shows an example of the intermediate extracted image. 

Contour Shaping 

15 If the user determines that the contour shape of the subject is incomplete in the mask region subjected to the 

process up to step S35, a contour shaping process is performed. 

Fig. 13 is a flow chart showing the contour shaping process sequence. The objective of this process is to stably 
extract the subject region with a correct shape independently of the contour pattern present in neighboring region of 
the boundary between the subject image and the background image. To attain this objective, the contour shaping 
20 process sequence uses difference data between the normalized edge intensity distributions of both the subject image 
and background image (see step S16). 

The second objective of the contour shaping process is to avoid an edge selection process (step S44; to be de- 
scribed later) from being influenced by the edge pattern of the background present inside the edge-seed image (as in 
the difference edge seed extraction in step S16) to disturb correct subject contour extraction. To attain this objective, 
25 the background edge image is subtracted from the subject edge image to calculate an edge intensity value, and points 
(or a region) having a negative edge intensity value are removed from the difference edge data. However, the above- 
mentioned objectives may be ignored, and the same process as that to be described below may be executed using 
the edge intensity data of the subject image. 

More specifically, as shown in Fig. 1 4, the contour shaping process searches for one edge point that makes up 
30 an accurate contour line from edge candidates P v P 2 , and P 3 that form an incomplete contour line. 
The contents of the contour shaping process will be described in detail below. 

In step S41, pixels at which the difference data between the normalized edge intensity subject image and the 
normalized edge intensity background image is equal to or smaller than a predetermined threshold value (for the 
threshold value > 0) are removed to leave only reliable subject edge data. 
35 in step S42, the contour line of the grown mask region is traced to detect the tracing direction. 

In step S43, at each point on the contour line, the region to be subjected to the shaping process (see Fig. 14) is 
set in a direction perpendicular to the detected contour tracing direction (the tracing direction is defined so that the 
right-hand side of the tracing direction is always in the subject region). Data representing the traced contour line is 
expressed as a function of the tracing path length (arc length s) (a length x(s) in the x-direction, and a length y(s) in 
40 the y-direction) and, for example it is a set of edge points. The path length to the pixel edge of interest is expressed 
by x(s) and y(s). Edge candidates that should form a corrected contour line are searched for from the region to be 
processed (see Fig. 14) set by the process in step S43. 

If there are a plurality of edge points serving as candidates, an edge selection process is performed in step S44. 
Fig. 14 explains the edge selection process in step S44. In Fig. 14, points Qg_ 2 ' C^, Qs, and the like indicated by 
45 full circles are those already selected by the edge selection process executed so far, and points P 1f P 2 , P 3 , and the 
like indicated by open circles are edge candidates to be selected in the current selection process. The processing 
region includes points to be subjected to edge selection, and is set in a direction perpendicular to the contour tracing 
direction, as described above. 

Determination of a correct edge line, i.e., edge selection is attained based on evaluation of the continuity of pixel 
50 values (R, G, and B values) and evaluation of shape continuity. More specifically, evaluation of the continuity of pixel 
values discriminates feature continuity C c (continuity of R, G, and B values) between an edge candidate (open circle 
point) and the subject edge, and evaluation of the shape continuity discriminates continuity C s of the contour shape. 
More specifically, an energy function F as a sum of the two quantities C c and C s that represent the continuities is set, 
and an edge candidate having a small energy function F value is selected, so that the selected edges make up a correct 
55 contour line. 

In this embodiment, the feature continuity C c is expressed by drifts (values between inner neighboring pixels), in 
the contour tracing direction, of R, G, and B features of an edge candidate on the subject side on a contour line including 
the edge candidate (assume that this contour line is made up of four pixels including three already selected edge pixels 
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(Qs-2. Qs-i i and Q s in Fi 9 1 4 ) and the ed 9 e candidate to be connected thereto (one of P-, , P 2> and P 3 ), and is defined by: 

c = «* + « + « (10 ) 
* MR MG MB 

where dR, dG, and dB are the differences of R, G, and B components between an unknown edge candidate P x and, 
e.g., two edge points and already determined as a contour More specifically, dR, dG, and dB are respectively 
given by: 

dR = AR Q + AR 1 
dG = AG 0 + AG 1 
dB = AB 0 + AB 1 



where AR 0 denotes variance of R value between points P x and C^, AR-, denotes variance of R value between points 
20 p x and Q s . v Similar notations are applied to AG 0 , AG 1? AB 0 , and AB V Also, MR, MG, and MB are the maximum 
differences of R, G, and B components (differences between maximum and minimum values) between an unknown 
edge candidate P x and the edge points a. and . As shown in equation (1 0), since dR, dG, and dB are respectively 
divided by MR, MG, and MB, C c represents a sum total of the normalized differences of R, G, and B components. 
However, such normalization process is not a necessary condition for evaluating the feature continuity. 
25 An evaluation function C s of the shape continuity evaluates using the curvature of the contour line in a local region 

including an edge candidate. The contour line is expressed by the arc length s. Since the curvature of the contour line 
is expressed by second derivatives x^ and y^ of the coordinate values x and y of pixels on the contour for s, the 
evaluation function C s is: 



c s = Vx^T^ (n) 

for 

x^ = d 2 x/ds 2 , y^ = d 2 y/ds 2 

Fig. 16 shows the concept of the evaluation function C s . To obtain a second derivative, three edge points are required. 
Equation (11) represents: 



a curvature (C s ) 1 = ( 7 x ss + v ss )i • or 



a curvature (C s ) 2 = ( Jx^ + y^ ) 2 . 



Hence, to maintain the continuity of the curvature is to select an edge candidate P t that satisfies (C^ = (C s ) 2 . 

Note that C s may be given by a first derivative associated with a contour line sampling point sequence including 
an edge candidate of contour line data. When equation (11 ) is discretized using selected and non-selected data, C s is 
so given by the following equation (1 2) : 



S = ^(e x -3*E x (s)+3-E x (s-1)-E x (s-2)) 2 +(e y -3.E y (s)+3*E y (s-1)-E y (s-2)) (12) 

where E x (s) and E y (s) are the already determined contour line data (or as already set initial values), and e^ and e y are 
the x- and y-coordinates of each edge candidate (i.e., one of points P v P 2 , and P 3 ). If the contour tracing direction is 
assumed to be the upper or lower direction (y-direction), since the search region of the edge candidates is set in a 
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direction perpendicular to the tracing direction, ^ component is the edge candidate coordinate value to be determined 
(if the tracing direction is the right or left direction, By becomes a variable factor). 
The energy function F is given by: 

F = C c+ a.C s (13) 

where a is a factor that serves as a weighting coefficient between C c and C s (0 < a < 1 ), and can be considered as a 
kind of regularization parameter in the regularization process. The weighting coefficient a can be appropriately selected 
10 by the user. 

On the other hand, if no edge candidate is present in the local region set in step S43, i.e., no edge candidate 
having high reliability is present, a contour point on the mask or a predicted edge position in the contour tracing direction 
(for example, if the tracing direction is the upper direction, a point obtained by increasing the y-coordinate value by 
one pixel while fixing the x -coordinate may be selected) is selected as an edge candidate point, and a point having an 
15 energy function with a lower evaluated value is selected or a point on boundary of the mask is determined as the 
selected edge. 

After the edge selection process in step S44, a mask data smoothing process is done in step S45. In this step, 
median filter processes to the contour data (one-dimensional) and two-dimensional mask data are performed. With 
this process, when the above-mentioned edge selection process (step S44) results in a still incomplete partial shape 
20 (e.g., when a discontinuous uneven portion or dither pattern remains in the vicinity of the subject contour line as a 
result of the process with no highly reliable edge candidates), such portion can be smoothed to improve the degree of 
approximation of the subject contour shape. 

Note that the median filter process may be recursively applied. Needless to say, the smoothing filtering process 
is not limited to the median filter. 
25 A subject image masked by executing the masking process based on mask data obtained by the contour shaping 

process (step S40) is output (step S50 in Fig. 3), and the result is output to the image display apparatus 9 or a printer 
as the subject image output unit 8. 

Fig. 9 shows the subject image. 

30 Modification of First Embodiment 

The control procedure shown in Figs. 3, 10, and 13 can be variously modified. 

For example, in step S18 in Fig. 3, if the initial seed image is an edge-rich image in the combining process of the 
color difference seed image and edge seed image, the background image removal process need be done. More spe- 
55 cifically, points, which have pixel values equal to or larger than a predetermined threshold value, in the normalized 
edge intensity background image corresponding to the mask region formed by the temporarily extracted initial seed 
image are removed. 

On the other hand, the initial mask region extraction process is not limited to the above-mentioned procedure 
(steps S15 to S18 in Fig. 3). For example, the initial mask region may be extracted by executing a thresholding process 
40 of statistic parameters such as correlation coefficients among blocks each of which is defined to have each pixel as 
the center and has a predetermined size, or average values or standard deviations of pixel values in each block, and 
the like. 

Note that the image sensing system of the first embodiment is premised on the fact that image sensing is made 
while the image sensing apparatus 1 is fixed to, e.g., a tripod, or image sensing is made while the exposure condition 
45 or focusing is not automatically set. If, in the apparatus of the first embodiment, image sensing is made in the hand- 
held state, or the background and subject images are sensed while automatically setting the exposure condition and 
focusing, the background and subject images must be aligned. This alignment or position adjustment is attained by 
executing the process of the third embodiment (steps S114 and S115; to be described later). 

50 <Second Embodiment 

The image sensing system in the second embodiment applies the techniques of the first embodiment to the initial 
mask region extraction and region growing process, but executes a contour shaping process using an "active contour 
model method" (M. Kass et al., "Snakes: Active Contour Models", International Journal of Computer Vision, vol. 1, pp. 
55 321 -331, 1987). 

The active contour process is to move and deform initial contours to minimize evaluation functions to be described 
later, and to finally converge the initial contours to the outline or its envelope of the object. In the second embodiment, 
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an active contour shaping process (step S400) is performed for a mask image (data having values 'OS and °1 "s), and 
the contour shaping process of the first embodiment (step S40) is performed for a subject image. 
Fiq 17 is a flow chart showing the procedure of the active contour shaping process. 

The active contour shaping process uses, as the start point, a mask region obtained by executing the same initial 
s mask region extraction process (steps S1 5 to S1 8) and region growing process (steps S31 to S35) as those in the first 

^^oTs^pecifically, in step S401 in Fig. 17, an "initial contour line" is set on the basis of the obtained mask region. 

This "initial contour line" is the one that serves as the starting point of the active contour process, and is set by 
enlarging the boundary line of the mask region to a predetermined magnification factor to have-the centroid of the mask 

10 reqion as the center or is set around the mask region using a pointing/selection device such as a mouse or the like^ 
Since an evaluation function E to be described later is conditioned to shrink the contours after the contours are subjected 
to the process, the "initial contour lines" are set to be larger than the mask, as shown in Fig. 18. 

In step S400 the active contour process to be described below is done on the mask data which is binary in lumi- 
nance level More specifically, a contour line shape v(s) is calculated by minimizing the value of the evaluation function 

is E given by equations (1 4) below with respect to a contour line v(s) = (x(s), y(s)) expressed using a parameter s (typically, 
an arc length s) that describes the coordinates of the individual points on the contour: 



20 



25 



. . . (14) 

E = J||(Ei(v(s)) + w 0 Eo(v(s))ds 



E,(v(s)) = a(s) 



dv 



2 



+ P(s) 



d 2 v 



2 



ds 

Eo(v(s)) = -IAKvOO)! 2 



ds 2 



where l(v(s)) is the luminance level on v(s), V is the differential operator, and cc(s), p(s), and w 0 are appropriately 
30 selected by the user. 

In step S402 the active contour process is recursively performed for the contour line shape v(s). That is, after a 
contour line shape v(s) is obtained by performing the process for minimizing the evaluation function E given by equations 
(14) once, the active contour process is recursively done for the obtained contour line shape v(s) to time-serially and 
sequentially deform and/or move the contour v(s). 
as Each recursive step of the active contour process is processed by selecting a point that min.mizes the function E 

from a set of points (a neighboring region defined in advance) within the movable range at each point on the contour 
line v(s) or by solving Euler equations of the contour v(s) that minimizes the evaluation function E using the calculus 

° f Executing the active contour process on the contour lines of the mask data is to prevent the contour lines from 
40 erroneously converging to the background pattern (which is present in the vicinity of the subject on the subject image) 
and to allow smoothing of the mask shape after region growing and correction of the contour shape of the non-grown 
region. Especially, as the correction function, the active contour process is effective when a smooth, continuous shape 
corresponding to a subjective contour line is generated to compensate for a lost portion of the shape. 

In many cases, the contour shape v(s[ of the mask after the evaluation function E (equations (14)) converges to 
45 a minimum value is considered as sufficiently close to the subject shape. 

In step S403 the contour shape after convergence is set as the "initial contour line" of the subject image region. 
That is the contour line obtained for the obtained mask in steps S401 and S402 is applied to the subject image. To 
attain this in step S404, the active contour process (equations (14)) is applied to the contour of the subject image in 
the same manner as in the active contour shaping process of the mask, thereby enlarging the contour of the subject 
so region of the subject image. 

In step S405, the interior of the finally converged contour is extracted and is set as the mask region. 

Modification of Second Embodiment* 

ss in the second embodiment, not all the steps of the process shown in Fig. 17 are always indispensable. 

More specifically, 1he active contour process on the mask data in step S402 is sometimes not indispensable, and 
in such case, steps S402 and S403 may be omitted. 

Upon setting the "initial contour line" in step S401 , the initial mask obtained by the process shown in Fig. 3 may 
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be directly used without executing any region growing process (Fig. 10). When the processes are modified in this 
manner, the roughly converged contour shape is further subjected to the contour shaping process on the subject image 
to extract the shape of details. Depending on the particular conditions involved, after the initial mask region is extracted, 
the active contour process may be performed on the subject image. 

<Third Embodiment 

The third embodiment is characterized by using a threshold value whose value is distributed in correspondence 
with the subject image in region growing. 

Fig. 1 9 is a block diagram showing the arrangement of an image sensing system according to the third embodiment. 
In the third embodiment, two images, i.e., a subject image and a background image excluding the subject are used as 
input images as in the first embodiment. 

An image sensing apparatus 1 comprises, as its major constituting elements, image forming optics 1a including a 
lens, a stop, and a lens driving controller 1e, an image sensor 1b, an image signal processor (which performs gamma 
characteristic control, white balance control, exposure condition control, focusing characteristic control, and the like) 
1c, an image recorder 1d, an interface unit If, and the like. 

An extraction apparatus 2 comprises an image memory 3 including memories 3a and 3b, an initial mask region 
extractor 5 for extracting the prototype of a mask used for extracting the subject region, a growing module 6 for growing 
an initial mask region, and a subject image output unit 8 for outputting the subject image extracted using the grown 
mask. The extraction apparatus 2 is connected to an image display apparatus 9, a terminal apparatus 10, and an 
interface unit 11 for interfacing the image memory 3, the initial mask region extractor 5, the growing module 6, the 
object image output unit 8 the terminal 10, and the like to each other, and the like, as in the first embodiment. 

Figs. 20A and 20B are flow charts showing the subject extraction process procedure according to the third em- 
bodiment. 

In step S111, a subject image and a background image are input from the image sensing apparatus 1. In step 
S11 2, image data are subsampled in accordance with an appropriate reduction factor to increase the processing speed 
in the subsequent steps. In step S11 3, a processing region is set to include the subject on the subject image. Note that 
the subsampling process in step S112 and the processing region setting process in step S113 may be omitted. Steps 
S111, S112, and S1 13 are substantially the same as steps S11, S12, and S13 in the first embodiment. 

The system of the third embodiment receives an input subject image (Fig. 4A) and a background image (Fig. 4B) 
as in the first embodiment. 

Geometrical Transform 

Steps S114 and S115 are optional processes. More specifically, when image sensing is done in the hand-held 
state without fixing the image sensing apparatus 1 to, e.g., a tripod or the background and subject images are sensed 
while automatically setting the exposure condition and focusing, the positions of the background and subject images 
must be adjusted. In steps S114 and S115, position adjustment is performed. When image sensing is not performed 
in the hand-held state, an exposure state, and a focus state are fixed, neither geometrical transform nor color adjustment 
are necessary, and the need for steps S114 and S115 is obviated. 

In step S1 1 4, parameters that express geometric transformation (e.g. , affine transformation parameters) for match- 
ing each two certain corresponding points in the subject and background images each other, and level transformation 
parameters for matching the levels of R, G, and B components are extracted. In step S115, using the extracted pa- 
rameters, the position adjustment (shift, rotation, and magnification conversion) between the subject and background 
images and level adjustment (estimation of a nonlinear function for correction using the method of least squares or the 
like) of the color components (R, G, and B values) are performed. 

With these process, the subject and background images substantially match in terms of their positions and colors. 

As other matching parameters, statistics such as correlation coefficients among blocks each of which is defined 
to have each point as the center and has a predetermined size, the average value or standard deviation of pixel values 
in each block, or the like may be extracted by a threshold value process. 

Initial Seed Extraction 

Subsequently, in steps S116, S117, and S11 8, a process for extracting an initial seed serving as a seed of a region 
growing process is executed. 

In step S116, a threshold value parameter for extracting an initial seed is set. This parameter may use a prede- 
termined value or may be input by the user. When a relatively large threshold value to be used in initial seed extraction 
is set, the influence of variations in pixel values due to noise and image sensing condition differences can be eliminated, 
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and a light shadow and the like can be removed. 

In step S117, the differences between the color components (R, G, and B values or hue, saturation) of the back- 
ground and subject images in units of pixels are calculated, and are binarized using the threshold value determined in 
step S116. This binary image (an image having "0"s and "1"s alone) is an initial seed. 
s Fig. 21 A shows the mask region obtained by initial seed extraction (the subject region is indicated by a black 

portion). The above-mentioned process is executed by the initial mask region extractor 5 in the extraction apparatus 
2 shown in Fig. 19. In place of the initial mask region extractor 5, the process may be executed by a program in a 
computer of the terminal apparatus 10. 

With the above-mentioned processes, the initial seed as a binary image is obtained. Fig. 21 A shows the relationship 
io between the subject image and mask image. 

Region Growing 

In a general combination of a background and subject, the region of the subject to be extracted cannot be completely 
is extracted yet in this process; the initial seed cannot be directly used as the mask. More specifically, when the subject 
and background images have regions in which the R, G, and B levels or their local statistics (average values, standard 
deviations, or the like) are similar to each other at identical positions, such partial regions remain as non-extracted 
regions after the thresholding process. Hence, the subsequent region growing process (steps S119 to S125) retrieves 
such regions. 

20 in the region growing process, the similarity of image features between pixels (indicated by X in Fig. 22) and their 

neighboring pixels (or region) (pixels indicated by O in Fig. 22) on the subject image corresponding to the boundary of 
the initial mask is calculated, and if the calculated similarity is higher than a predetermined threshold value, the neigh- 
boring pixels are considered as those within the subject region and are incorporated in the mask region. This process 
is executed by the initial growing module 6 in the extraction apparatus 2 shown in Fig. 1 9. In place of the initial growing 

25 module 6, the process may be executed by a program in a computer of the terminal apparatus 10. 

The region growing of the third embodiment will be described below. The region growing is performed based on 
image data of the subject image. 

Prior to the region growing, the extraction process of the edge intensity distribution of the subject image is performed 
in step S11 9. More specifically, the edge image of the subject image is extracted. The edge intensity distribution image 

30 has gradation values. In step S120, the edge intensity image is binarized using a predetermined threshold value. That 
is, a binary edge image is obtained. The binary edge image is used in setting a threshold value distribution (step S123; 
to be described later). 

In step S121 , in order to limit the range of the region growing of the initial seed, the maximum region growing range 
is set. This maximum range is set based on region data (coordinate data) of the initial seed. More specifically, the 

35 maximum range is defined by a set of minimum and maximum values (..., MaxY(x k ),..., MaxY(x m ),..., ...MinY(x k ),..., 
MinY(x m )...) of y-coordinates at the individual points in the horizontal (x) direction of the region where the initial seed 
is present, and a set of minimum and maximum values (..., MaxX(y k ),.. M MaxX(y m ),..., MinX(y k ),..., MinX(y m ),...) of x- 
coordinates at the individual points in the vertical (y) direction, as shown in Fig. 23. 

In order to optimize the outermost contour, smoothing (using a low-pass filter) may be performed. The smoothing 

40 uses a median filter of a predetermined size. The smoothing using the median filter can suppress abrupt variations in 
the initial seed, and can provide a smooth maximum growing region roughly along the contour shape of the object, as 
shown in Fig. 24. Fig. 21 A shows an example of the maximum region growing range after the filter process. 

The region growing of a seed in the third embodiment is attained by incorporating a point, which is located inside 
a boundary point of the seed and is "similar" to the boundary point, into the seed. The similarity between the two points 

45 is determined based on brightness and hue similarities between pixel values of a pixel (X) located on the boundary of 
the seed and its neighboring pixel (O), as shown in Fig. 26. In the third embodiment, the similarity is determined based 
on an absolute difference AP between the brightness values (or hue values) of a point on the boundary and the point 
of interest. That is, the absolute difference AP is compared with a predetermined threshold value 5, (or 8 H ), and if the 
absolute difference AP is smaller than the threshold value, it is determined that these two points are "similar" to each 

50 other. That is, if the following relation holds, it is determined that the two points are similar to each other: 

AP! < 8, or AP H < 8 H 

55 The third embodiment is characterized in that the threshold value of similarity (i.e., difference) determination is 

distributed. In steps S122 and S123, this distribution is determined. 

In step S122, an initial value 8, 0 or 5h 0 of the threshold value 8, or 8 H required for determining the similarity (i.e., 
difference) is input. 
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In step S123, the threshold value is variably set in the three following ways on the basis of the maximum growing 
range determined in step S121 (also using the edge intensity distribution obtained in step S119 as needed). 

In the first method, the threshold value applied to the pixels of the subject image included in the maximum growing 
range is set to be small, and the threshold value for pixels outside the range is set to be large. More specifically, the 

5 initial threshold value (5,, 5 H ) is used as a large threshold value, and a value 10% the initial threshold value is used as 
a small threshold value. The first method weighs higher for pixels within the maximum growing range, in other words, 
the mask growing direction is constrained to be inside of the maximum growing range. 

In the second method, as the distribution function of the threshold value, an arbitrary function which decreases as 
the distance from the boundary line of the maximum growing range toward the outside becomes larger is used. More 

10 specifically, since this distribution function assumes a larger value as the pixel is further from the contour line toward 
the outside of the contour line, it tends to suppress region growing in regions closer to the contour line. This is because 
the difference AP of the pixel of interest must be smaller than that small threshold value to incorporate the pixel of 
interest in the contour and, hence, only pixels having small differences AP can be incorporated in the contour. 

Note that the distribution function need not be continuous but may be quantized. When a quantized distribution 

is function is set, the threshold value assumes an identical value within the predetermined range. 

Furthermore, as another distribution function, the threshold value 5, may be independently set in the vertical and 
horizontal directions (5^, \). In this case, inside the maximum growing range, as the pixel is farther from the contour 
line in the vertical (y) direction, 5 ly is set at a larger value; as the pixel is farther from the contour line in the horizontal 
(x) direction, 6^ is set at a larger value. 

20 in the third method, the threshold value distribution function is set based on the edge distribution in the image 

(binary image obtained in step S119) obtained by binarizing the edge intensity distribution of the subject image by the 
predetermined threshold value. More specifically, the value of the threshold value distribution function is set at a small 
value at the position of an edge and its neighboring positions. Also, the distribution function values are set so that the 
function assumes the smallest value at the edge position and increases slightly in correspondence with the distance 

25 from the edge at the neighboring position of that edge. For example, if the function assumes a value "0" at the edge 
position, the region growing in a direction crossing the edge is perfectly inhibited. On the other hand, the distribution 
function may be set to assume a uniformly small value at the edge position and its neighboring positions. 

Fig. 27 shows an example of the distribution function set by the third method. In Fig. 27, the bold solid line indicates 
the threshold value distribution, and the thin solid lines indicate the distribution of edges. In this example, two edges 

30 (400, 401) are detected. A threshold value distribution function 300 assumes small values 5, ow in the vicinity of the 
edges 400 and 401 , and assumes larger values 8 high as the pixel position is farther from the vicinity of the edges. As 
a result, contour growing is suppressed in the vicinity of the edge. 

Note that the boundary line of the maximum region growing range may be displayed on the display apparatus to 
be superposed on the input image, and the user may set an appropriate smoothing filter size based on the displayed 

35 range. 

Subsequently, in step S124, the similarity (difference) between a pixel on the contour line and its neighboring pixel 
is determined. Especially, in this embodiment, if the absolute difference values IAPr q.b 1 of R - G ' and B components 
between the pixel on the contour line and its neighboring pixel become equal to or smaller than a threshold value, or 
the absolute difference value of hue becomes equal to or smaller than a threshold value, it is determined in step S125 
40 that the pixel on the contour line is similar to that neighboring pixel, and the neighboring pixel is incorporated in an 
identical subject region. Fig. 25A shows an example of the mask subjected to region growing by the method of the 
third embodiment. 

According to any one of the threshold value setting methods, robustness and stability of the region growing with 
respect to the initial threshold value (6,, 5 H ) can be achieved (variations in shape along the contour shape of the subject 
45 are small). Even when the maximum growing range is different from the outer contour shape of the subject, they may 
be roughly matched upon approximately setting the threshold value (5,, 8 H ). 

Furthermore, a hole filling process for automatically filling holes having a predetermined size or less in the region- 
grown mask data is executed (step S1 26). 

This hole filling process is performed independently of the subject image data, i.e., the similarities, uniformities, 
50 or the like of the image features to neighboring regions, and is performed for binary mask data. The grown region 
obtained as a result of the above-mentioned process is used as a subject extraction mask region, and the corresponding 
region is extracted from the subject image (step S127). The extracted image data (or an image file) is output to the 
display (step S128), thus ending the extraction process (see Fig. 25B). 

55 Modification of Third Embodiment 

The features used in similarity determination are not limited to the above-mentioned R, G, and B values or hue 
value. For example, features obtained by statistically processing loworder features such as saturation, higher level 
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features such as the partial shape (the direction of segment) or local spatial frequency of a local line segment including 
an edge, and low-level features such as R, G, and B values and the like are preferably used. 

The incorporation process of the region growing is not always limited to eight neighboring pixels, but a neighboring 
region obtained by another method may be used. 
5 In the subject image extraction process (step S1 27), the subject image corresponding to the mask may be extracted 

after the smoothing process or correction process of the boundary line of the mask region. The extracted image output 
process (step S1 28) is performed by the subject image output unit 8 in the extraction apparatus 2 shown in Fig. 1 9. In 
place of the subject image output unit 8, the process may be executed by a program in a computer of the terminal 
apparatus 10. 

10 The extraction apparatus 2 of the third embodiment may be implemented by various modes in addition to the 

above-mentioned hardware arrangement, such as one implemented by programs shown in the flow charts in Figs. 14 
and 15, one implemented by gate arrays, one built in the image sensing apparatus 1 , and the like. 

Note that the threshold value 8, or 5 H may be automatically set on the basis of the statistics such as an average 
value, standard deviation, or the like, of the differences (difference absolute values) associated with the individual 

15 parameters between the background and subject images. 

<Fourth Embodiment 

Fig. 28 is a block diagram showing the arrangement of an image sensing system according to the fourth embod- 
20 iment. In Fig. 28, reference numeral 51 denotes an image input apparatus which corresponds to an image sensing 
apparatus or an image data base unit. The image sensing apparatus is not particularly limited, but a video camera, a 
binocular camera, or the like may be used. 

Reference numeral 52 denotes a primary feature extractor for extracting primary feature data of an image; and 
53, a characteristic uniformity evaluator for evaluating the uniformity of the feature data. Reference numeral 54 denotes 
25 an image memory; 55, a region segmentator for segmenting an image into a plurality of regions on the basis of the 
uniformities of the feature data; 56, a divided image generator; and 57, a growing module based on the secondary 
feature of an image. Reference numeral 58 denotes a display apparatus. Reference numeral 59 denotes a pointing 
device (e.g., amouse) for designating the segmented region to be selected. 

In the fourth embodiment, region growing is attained in such a manner that the uniformity of features in a prede- 
30 termined region in an image is evaluated, and regions are segmented or integrated so that pixels having uniform 
features belong to a single region. This embodiment is characterized in that primary feature data in an image are 
extracted, and the above-mentioned region segmentation of the image is done based on the extracted distribution, 
thereby roughly, extracting image regions which may be extracted. In the next step, to attain fine extraction, region 
growing based on secondary features (those having different types and attributes from those of the primary feature 
35 data) using region candidates as seeds is performed in the same manner as in the third embodiment. Assume that the 
regions extracted based on the primary feature data include image information required for performing region growing 
based on the secondary feature data. 

In the fourth embodiment, the secondary feature can be basically different from the primary feature, but does not 
mean any limitations such as data having a geometric structure or data having high-order feature obtained by process- 
40 ing a brightness distribution and color components, and the like. For example, color component information or the like 
is conveniently used. The secondary feature may have the same type as the primary feature (in this embodiment, a 
motion vector or disparity vector) as long as the image can be segmented more finery by region growing. For example, 
the second feature differs from the first feature is resolution. 

The objective of detecting the primary feature data is to allow the operator to make rough selection and designation 
45 upon extracting the image regions to be specified in practice or to facilitate automatic extraction processing since high- 
speed extraction of regions that serve as growing seeds in the region growing process can be attained. 

As the primary feature data, when time-serial image data are input from the image input apparatus 51 , the motion 
vector distribution of the individual points on the screen is used, or when images input from a mufti eye camera are 
used, the disparity vector distribution of corresponding points between right and left images is used. The primary feature 
50 extractor 52 may have slightly lower extraction precision than the precision (resolving power) of a secondary feature 
extractor 571 used subsequently, but may preferably attain high-speed extraction. For this purpose, dedicated hardware 
for extracting primary feature data and combining uniform regions may be set. 

Fig. 29 is a block diagram showing the detailed arrangement of the image sensing system shown in Fig. 28. Note 
that the detection algorithm of disparity vectors (motion vectors) is not the gist of the present invention, and a detailed 
55 description thereof will be omitted (see Yachida, "Robot Vision", Seikodo, and the like). 

In the fourth embodiment, the uniformity of feature data is expressed by a variance associated with the magnitude 
and direction of a primary feature (motion vector, disparity vector, or the like) within the predetermined region. 

By appropriately setting the size of the region (block) to be evaluated on the basis of the size of an object in the 
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frame, the processing time can be shortened, and high efficiency can be realized. More specifically, a segmented 
region about a fraction to 1/10 of the size of the object in the frame is typically used. This value may be appropriately 
set in advance by the operator. 

The region segmentator 55 determines that the region is uniform when the uniformity value (for example, the 

5 variance of features in that region) is smaller than a predetermined threshold value, i.e., the variance is small. In Fig. 
30, D 0 to D 5 are regions respectively including uniform features. If, for example, the uniformity representative values 
of the regions D 0 and Dj are nearly equal to each other within a predetermined allowable range, i.e., D 0 . D-, , the regions 
D 0 and can be connected to each other. If the region D 4 has a variance of the feaiures (D 40 and D 41 ) failing outside 
the allowable range, and D 40 _ D 3 and D 41 . D 5> the region D 4 is divided into regions and D 41 , and the regions D 40 

io and D 3 , and regions D 41 and D 5 can be respectively connected to each other. With this process, uniform massive 
regions that can be represented by primary features (disparity vectors, motion vectors, or the like) of constant values 
are formed. 

This process can be considered as one of so-called region growing processes. However, in this process, the region 
growing conditions (restraints) as in the third embodiment are not given. 
15 Fig. 31 A shows an example of an input image sensed by a multi eye camera, Fig. 31 B shows the rough region 

segmentation result based on the sizes of disparity vectors, and Fig. 31 C is an explanatory view showing the image 
extraction result extracted by region growing (and division) based on secondary feature data such as color components. 

Fig. 31 B shows initial seeds extracted according to the primary features. In the example shown in Fig. 31 B, three 
regions are extracted depending on the absolute disparity values. As shown in Fig. 31 B, the initial seeds need not 
20 always accurately reflect the actual shape of an object as in the first embodiment, but no background region is preferably 
mixed in the divided regions (initial seeds). For this reason, after region division (after extraction of initial seeds), regions 
reduced at a predetermined ratio or erosion results of masks using a morphological operator may be used as the 
segmented regions (initial seeds). 

On the other hand, when a background region is partially mixed in the initial seed region, that initial seed region 
25 is divided based on, e.g., color components, and region growing is performed after small regions including the contour 
line of the initial seeds are deleted, so that the subject shape can be extracted more accurately. 

The segmented image (initial seed image) generator 56 assigns image data of different attributes (e.g., different 
colors, different hatching patterns, or the like) to a plurality of regions divided on the basis of primary feature data, and 
displays these regions on the display apparatus 58 to be superposed on the input image. More specifically, the generator 
30 56 performs a process for labeling the segmented regions, and painting identical label regions by unique patterns (or 
colors). 

With this process, the operator can easily visually confirm the region candidate to be designated, and can easily 
designate the object to be extracted using the pointing device 59. As the pointing device 59, a mouse (or a tablet) is 
normally used. When the image includes only one moving object or only one region having a disparity value falling 
35 within the predetermined range, no designation/selection is required. All of a plurality of extracted initial seeds may be 
used as region growing seeds. 

In the example shown in Fig. 31 B, a region 500 at the lower left end is selected. 

The region growing module 57 comprises a secondary feature extractor 571 for extracting secondary feature data 
(e.g., hue, R, G, and B values, and the like) from the vicinity of the initial seed regions which are extracted and selected 
40 based on the primary feature data, a region growing module 572 based on the secondary feature data, a region inte- 
grator (hole filling means) 573 for connecting the grown regions, a maximum region growing range setting unit 574, 
and a threshold value distribution setting unit 575 for similarity evaluation of the secondary features with neighboring 
pixels. 

The processing contents of the setting units 574 and 575 are the same as those in the third embodiment. As a 
45 threshold value distribution setting method unique to this embodiment, for example, discontinuous portions of primary 
feature factors (disparity vectors, motion vectors, or the like) in the initial seed region combined by designation/selection 
(or automatic process) are equivalent^ processed as those of secondary feature factors (R, G, and B values, hue, and 
the like), and the threshold value for similarity evaluation on such portions (and neighboring portions) may be set to 
be small. 

50 Fig. 32 is an explanatory view showing setting of the threshold value distribution according to the fourth embodi- 

ment. Note that a disparity edge means a portion where the rate of change in disparity vector becomes larger than a 
predetermined threshold value in a neighboring region including that point. In this case, when a so-called edge intensity 
distribution in an image (the intensity distribution or a distribution obtained by applying a differential operator such as 
a SOBEL operator to the intensity distribution of color components) and the disparity edge are observed at an identical 

55 point, the lowest threshold value (5, ow ) is set; when either one of these edges is observed, a middle threshold value is 
set; and when neither edges are observed, and the region does not belong to any regions in the vicinity of an edge, a 
high threshold value (5 high ) is set. 

Though a various embodiments which include a binocular camera have been described, the present invention is 
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not limited to them. It may be applied to a system using multi-eye camera. 

As many apparently widely different embodiments of the present invention can be made without departing from 
the spirit and scope thereof, it is to be understood that the invention is not limited to the specific embodiments thereof 
except as defined in the appended claims. 

s 

Claims 

1 . An image extraction method for extracting image data of an object from a first image that records both a background 
10 and the object to be extracted, using a mask, characterized by comprising: 

the first step (S1 8) of generating an initial mask for extracting an image of the object on the basis of difference 
data between the first image and a second image that records the background alone; 
the second step (S31—S35) of growing a region of the generated initial mask on the basis of a similarity 
is between a feature of a first region of the first image corresponding to the initial mask, and a feature of a second 

region in the vicinity of the first region; and 

the third step (S50) of extracting the image data of the object from the first image on the basis of the grown 
mask region. 

20 2. The method according to claim 1 , wherein the first step includes the step (S1 6) of using as the initial mask a binary 
image region obtained by a binarization process of difference data representing a difference between image data 
of the first and second images using a predetermined threshold value. 

3. The method according to claim 2, wherein the difference data represents a brightness difference between the first 
25 and second images. 

4. The method according to claim 2, wherein the difference data represents a color difference between the first and — 
second images. 

30 5. The method according to claim 1 , wherein the first step comprises: : 

the step of obtaining a first binary image region by a binarization process of data representing a brightness 

difference between the first and second images using a predetermined threshold value; : 3 

the step of obtaining a second binary image region by a binarization process of data representing a color ;v :„ 

35 difference between the first and second images using a predetermined threshold value; and ^ 

the step of generating the initial mask by combining the first and second binary image regions. 

6. The method according to claim 1 , wherein the second step includes the step (S32, S33) of checking based on 
brightness and hue similarities between the first and second regions if a pixel in the second region is to be incor- 

40 porated in the first region, and growing the mask region upon incorporating the pixel. 

7. The method according to claim 1 , wherein the second step (S34) comprises: 

the step of respectively extracting first and second edge intensity images from the first and second images; 
45 the step of calculating an edge density on the basis of data representing a difference between the first and 

second edge intensity images; and 

the step of suppressing growing of the mask when the calculated edge density is not more than a predetermined 
threshold value in a growing direction. 

50 8. The method according to claim 2, wherein the first step (S15) comprises: 

the step of normalizing the difference data representing the difference between the first and second images, 
and generating the initial mask on the basis of normalized brightness difference data. 

9. The method according to claim 2, wherein the first step (SI 5) comprises: 

55 

the step of extracting first and second edge intensity images representing edge intensities of the first and 
second images, respectively; and 

the step of normalizing both the first and second edge intensity images using.a predetermined normalization 
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coefficient when the first edge intensity image is an image having a small number of edges, the normalization 
coefficient being a maximum intensity value of the first edge intensity image. 

10. The method according to claim 2, wherein the first step (S15) comprises: 

the step of extracting first and second edge intensity images representing edge intensities of the first and 
second images, respectively; and 

the step of normalizing both the first and second edge intensity images using a maximum edge intensity value 
within a predetermined size region having a predetermined point of the first edge intensity image as a center 
when the first edge intensity image is an image having many edges. 

11. The method according to claim 6, wherein the second step (S32) includes the step of comparing differences be- 
tween brightness and hue values of the first and second regions with predetermined threshold values, and deter- 
mining that the second region is similar to the first region when the differences are smaller than the predetermined 
threshold values. 

12. The method according to claim 6, wherein the second region includes eight neighboring pixels of a pixel in the first 
region. 

13. The method according to claim 1 , wherein the second step further comprises the fourth step of shaping a contour 
line of the grown mask. 

14. The method according to claim 13, wherein the fourth step (S42~S44) comprises: 

the step of detecting the contour line of the grown mask; 

the step of generating an edge intensity image representing a difference between the first and second images; 
the step of setting a region having a predetermined width in a direction perpendicular to an extending direction 
of the contour line in the edge intensity image; 

the step of selecting a plurality of pixels of the edge intensity images in the region of the predetermined width 
as contour point candidates; and 

the step of selecting one contour point on the basis of continuity between a pixel on the contour line and the 
plurality of contour point candidates, thereby shaping the contour line of the mask. 

15. The method according to claim 14, wherein the continuity is determined by inspecting pixel value continuity. 

16. The method according to claim 14, wherein the continuity is determined by inspecting shape continuity. 

1 7. The method according to claim 1 4, wherein the continuity is determined by inspecting continuity with a pixel present 
inside the contour line. 

18. The method according to claim 14, wherein the continuity is determined by weighting and evaluating pixel value 
continuity and shape continuity. 

19. The method according to claim 1 3, wherein the fourth step further includes the step (S45) of smoothing the shaped 
contour line. 

20. The method according to claim 1 3, wherein the fourth step comprises: 

the active contour shaping step of recursively executing a process for deforming or moving a contour shape 
of the mask to minimize a predetermined evaluation function on the basis of the initial mask or a contour of the 
grown mask, and image data of the first image. 

21. The method according to claim 20, wherein the active contour shaping step (S402) comprises: 

generating a contour line by performing an active contour shaping process of data of the initial mask, and 
performing an active contour shaping process of the image data of the first image on the basis of the generated 
contour line. 

22. An image extraction method comprising: 
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the partial region extraction step (S117~ S121) of extracting a partial region as a portion of a subject to be 
extracted from an input image; 

the region growing step (S1 22~S1 25) of growing the extracted partial region using the extracted partial region 
as a see d by thresholding a similarity to a neighboring region using a threshold value, the threshold value 
5 being set on the basis of a feature distribution at individual points of the input image; and 

the extraction step (S127) of extracting an image of the subject on the basis of the region-grown region. 

23. The method according to claim 22, wherein the partial region extraction step includes the step (S11 7) of extracting 
the partial region on the basis of a difference between a background image excluding the subject, and a subject 

10 image including the subject. 

24. The method according to claim 22, wherein the feature distribution is an edge distribution of the subject. 

25. The method according to claim 22, wherein the feature distribution is a distribution within a maximum growing 
is range set based on the partial region. 

26. The method according to claim 22, wherein the threshold value is set to assume a value that suppresses growing 
of the region at an edge position as compared to a non-edge position. 

20 27. The method according to claim 25, wherein the threshold value is set to assume a value that promotes growing 
of the region in a region within the maximum growing range, and to assume a value that suppresses growing of 
the region outside the maximum growing region. 

28. The method according to claim 25, wherein the maximum growing range is obtained as an output when a shape 
25 of the partial region is smoothed using a smoothing filter having a predetermined size. 

29. The method according to claim 22, wherein the input image includes time-serial images, and the partial region 
extraction step includes the step of extracting the partial region on the basis of difference data between image 
frames at different times of the input image. 

30 

30. The method according to claim 22, wherein the input image includes a plurality of images from a plurality of different 
view point positions, and the partial region extraction step includes the step of extracting the partial region on the 
basis of a disparity distribution between the input images. 

35 31. An image extraction apparatus for extracting, from a first image including both a background and an object to be 
extracted, image data of the object using a mask, characterized by comprising: 

temporary storage means for receiving and temporarily storing the first image and a second image that records 
the background; 

40 initial mask generation means for generating an initial mask of an extraction region on the basis of difference 

data between the stored first and second images; 

region growing means for growing a region of the initial mask on the basis of a feature similarity to a neighboring 
region; and 

first image extraction means for extracting the image data of the object from the first image on the basis of the 
45 grown mask region. 

32. The apparatus according to claim 31 , wherein said initial mask generation means comprises: 

first seed extraction means for extracting a color difference seed by performing a threshold value process of 
so a difference between color components at individual points of the first and second images; 

second seed extraction means for respectively extracting first and second edge intensity images from the first 
and second images, and extracting an edge difference seed by performing a threshold value process of dif- 
ference data between the first and second edge intensity images; and 

generation means for generating an initial seed on the basis of outputs from said first and second seed ex- 
55 traction means. 

33. The apparatus according to claim 32, wherein said first image extraction means comprises: 
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means for extracting difference edge data between the first and second images; and 

means for shaping a contour line of the object on the basis of the extracted difference edge data. 

34. The apparatus according to claim 33, wherein said shaping means comprises: 

5 

threshold value process means for performing a threshold value process of the difference edge data or edge 
data of the first image; 

first continuity evaluation means for evaluating shape continuity for edge candidates that remain after the 
threshold value process; 

10 second continuity evaluation means for evaluating continuity of image features of the edge candidates; 

edge selection means for selecting one of the edge candidates on the basis of outputs from said first and 
second continuity evaluation means; and 

smoothing means for smoothing an extracted contour line or a correction mask region including the contour 
line. 
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35. The apparatus according to claim 33, wherein said region growing means comprises: 



means for determining a condition for suppressing a growing process; and 
means for determining a similarity. 

20 

36. An image extraction apparatus comprising: 

partial region extraction means for extracting a partial region as a portion of a subject to be extracted from an 
input image; 

25 region growing means for growing the extracted partial region using the extracted partial region as a seed by 

processing a similarity to a neighboring region using a threshold value, the threshold value being set on the 
basis of a feature distribution at individual points of the input image; and 

extraction means for extracting an image of the subject on the basis of the region-grown region. 

30 
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